具有动量的随机梯度下降(SGD)被广泛用于训练现代深度学习体系结构。虽然可以很好地理解使用动量可以导致在各种环境中更快的收敛速率,但还观察到动量会产生更高的概括。先前的工作认为,动量在训练过程中稳定了SGD噪声,这会导致更高的概括。在本文中,我们采用了另一种观点,并首先在经验上表明,与梯度下降(GD)相比,具有动量(GD+M)的梯度下降在某些深度学习问题中显着改善了概括。从这个观察结果,我们正式研究了动量如何改善概括。我们设计了一个二进制分类设置,在该设置中,当两种算法都类似地初始化时,经过GD+M训练的单个隐藏层(过度参数化)卷积神经网络比使用GD训练的同一网络更好地概括了。我们分析中的关键见解是,动量在示例共享某些功能但边距不同的数据集中是有益的。与记住少量数据数据的GD相反,GD+M仍然通过其历史梯度来了解这些数据中的功能。最后,我们从经验上验证了我们的理论发现。
translated by 谷歌翻译
Reliable and automated 3D plant shoot segmentation is a core prerequisite for the extraction of plant phenotypic traits at the organ level. Combining deep learning and point clouds can provide effective ways to address the challenge. However, fully supervised deep learning methods require datasets to be point-wise annotated, which is extremely expensive and time-consuming. In our work, we proposed a novel weakly supervised framework, Eff-3DPSeg, for 3D plant shoot segmentation. First, high-resolution point clouds of soybean were reconstructed using a low-cost photogrammetry system, and the Meshlab-based Plant Annotator was developed for plant point cloud annotation. Second, a weakly-supervised deep learning method was proposed for plant organ segmentation. The method contained: (1) Pretraining a self-supervised network using Viewpoint Bottleneck loss to learn meaningful intrinsic structure representation from the raw point clouds; (2) Fine-tuning the pre-trained model with about only 0.5% points being annotated to implement plant organ segmentation. After, three phenotypic traits (stem diameter, leaf width, and leaf length) were extracted. To test the generality of the proposed method, the public dataset Pheno4D was included in this study. Experimental results showed that the weakly-supervised network obtained similar segmentation performance compared with the fully-supervised setting. Our method achieved 95.1%, 96.6%, 95.8% and 92.2% in the Precision, Recall, F1-score, and mIoU for stem leaf segmentation and 53%, 62.8% and 70.3% in the AP, AP@25, and AP@50 for leaf instance segmentation. This study provides an effective way for characterizing 3D plant architecture, which will become useful for plant breeders to enhance selection processes.
translated by 谷歌翻译
A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. Such flows are a common technique used for data generation and density estimation in machine learning and data science. The density estimate obtained with a NF requires a change of variables formula that involves the computation of the Jacobian determinant of the NF transformation. In order to tractably compute this determinant, continuous normalizing flows (CNF) estimate the mapping and its Jacobian determinant using a neural ODE. Optimal transport (OT) theory has been successfully used to assist in finding CNFs by formulating them as OT problems with a soft penalty for enforcing the standard normal distribution as a target measure. A drawback of OT-based CNFs is the addition of a hyperparameter, $\alpha$, that controls the strength of the soft penalty and requires significant tuning. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $\alpha$. This is achieved by integrating the OT CNF framework into a Wasserstein gradient flow framework, also known as the JKO scheme. Instead of tuning $\alpha$, we repeatedly solve the optimization problem for a fixed $\alpha$ effectively performing a JKO update with a time-step $\alpha$. Hence we obtain a "divide and conquer" algorithm by repeatedly solving simpler problems instead of solving a potentially harder problem with large $\alpha$.
translated by 谷歌翻译
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition. In contrast with earlier strategies that alternated between training a model and generating pseudo-labels (PLs) with it, here PLs are generated in end-to-end manner as training proceeds, improving training speed and the accuracy of the final model. PL shares a common theme with teacher-student models such as distillation in that a teacher model generates targets that need to be mimicked by the student model being trained. However, interestingly, PL strategies in general use hard-labels, whereas distillation uses the distribution over labels as the target to mimic. Inspired by distillation we expect that specifying the whole distribution (aka soft-labels) over sequences as the target for unlabeled data, instead of a single best pass pseudo-labeled transcript (hard-labels) should improve PL performance and convergence. Surprisingly and unexpectedly, we find that soft-labels targets can lead to training divergence, with the model collapsing to a degenerate token distribution per frame. We hypothesize that the reason this does not happen with hard-labels is that training loss on hard-labels imposes sequence-level consistency that keeps the model from collapsing to the degenerate solution. In this paper, we show several experiments that support this hypothesis, and experiment with several regularization approaches that can ameliorate the degenerate collapse when using soft-labels. These approaches can bring the accuracy of soft-labels closer to that of hard-labels, and while they are unable to outperform them yet, they serve as a useful framework for further improvements.
translated by 谷歌翻译
Equivariance of neural networks to transformations helps to improve their performance and reduce generalization error in computer vision tasks, as they apply to datasets presenting symmetries (e.g. scalings, rotations, translations). The method of moving frames is classical for deriving operators invariant to the action of a Lie group in a manifold.Recently, a rotation and translation equivariant neural network for image data was proposed based on the moving frames approach. In this paper we significantly improve that approach by reducing the computation of moving frames to only one, at the input stage, instead of repeated computations at each layer. The equivariance of the resulting architecture is proved theoretically and we build a rotation and translation equivariant neural network to process volumes, i.e. signals on the 3D space. Our trained model overperforms the benchmarks in the medical volume classification of most of the tested datasets from MedMNIST3D.
translated by 谷歌翻译
尽管个人数据保护方面有法律进展,但未经授权实体滥用的私人数据问题仍然至关重要。为了防止这种情况,通常建议通过设计隐私作为数据保护解决方案。在本文中,使用通常用于提取敏感数据的深度学习技术研究了摄像机失真的效果。为此,我们模拟了对应于具有固定焦距,光圈和焦点的现实摄像机以及来自单色摄像机的灰度图像的现实摄像头的焦点外图像。然后,我们通过一项实验研究证明,我们可以构建一个无法提取个人信息(例如车牌编号)的隐私相机。同时,我们确保仍然可以从变形的图像中提取有用的非敏感数据。代码可在https://github.com/upciti/privacy-by-design-semseg上找到。
translated by 谷歌翻译
注意力机制是秩序不变的。位置编码是一个重要组成部分,以允许基于关注的深层模型架构,例如变压器来解决信息问题的序列或图像。在本文中,我们提出了一种基于学习傅里叶特征的新型位置编码方法。而不是将每个位置硬编码为令牌或向量,而是表示可以是多维的每个位置,作为基于被动傅里叶特征映射的可训练编码,用多层的傅立刻调制。表示对于空间多维位置,例如,在图像上的像素位置,其中需要捕获$ L_2 $距离或更复杂的位置关系。我们基于几个公共基准任务的实验表明,我们的学习傅里叶特征表示,用于多维位置编码的多维位置编码通过提高准确度并允许更快的收敛来实现现有方法。
translated by 谷歌翻译
深度学习的一个有前景的趋势取代了具有隐式网络的传统馈送网络。与传统网络不同,隐式网络解决了一个固定点方程来计算推断。解决固定点的复杂性变化,具体取决于提供的数据和误差容差。重要的是,可以通过与前馈网络的STARK对比度训练隐式网络,其内存需求与深度线性缩放。但是,没有免费的午餐 - 通过隐式网络锻造BackPropagation通常需要解决从隐式功能定理引起的昂贵的Jacobian等方程。我们提出了无雅各比的BackPropagation(JFB),一种固定内存方法,这些方法旨在解决基于雅略族裔的基于雅代族人的方程。 JFB使隐式网络更快地培训,并明显更容易实现,而不会牺牲测试精度。我们的实验表明,使用JFB培训的隐式网络与给出相同数量的参数的前馈网络和现有的隐式网络具有竞争力。
translated by 谷歌翻译
虽然深馈神经网络与灵长类动物视觉系统共享一些特征,但一个关键区别是他们的动态。深网络通常在串行阶段操作,其中每个层在处理开始于后续层之前完成其计算。相反,生物系统具有级联动力学:信息从所有层的神经元并行地传播,但是逐渐发生变速器,即使在馈送架构中也逐渐发生速度准确性贸易。我们通过构造级联的RESNET来探讨生物学激活的并行硬件的后果,其中每个残差块具有传播延迟,但所有块以状态方式更新。由于通过跳过连接传输的信息避免了延迟,所以架构的功能深度随着时间的推移而增加,因此随时通过内部处理时间来改善的任何时间预测。我们介绍了一个时间差异的培训损失,通过标准损耗实现了严格卓越的速度准确性概况,并使级联架构能够以最先进的任何时间预测方法。级联体系结构具有迷恋属性,包括:它比非典型实例更快地分类典型实例;对于持久性和瞬态噪声比传统的reset来说更强大;其时变输出跟踪提供了一种可以利用以改善信息处理和推理的信号。
translated by 谷歌翻译
人工智能代理必须从周围环境中学到学习,并了解所学习的知识,以便做出决定。虽然从数据的最先进的学习通常使用子符号分布式表示,但是使用用于知识表示的一阶逻辑语言,推理通常在更高的抽象级别中有用。结果,将符号AI和神经计算结合成神经符号系统的尝试已经增加。在本文中,我们呈现了逻辑张量网络(LTN),一种神经组织形式和计算模型,通过引入许多值的端到端可分别的一阶逻辑来支持学习和推理,称为真实逻辑作为表示语言深入学习。我们表明LTN为规范提供了统一的语言,以及多个AI任务的计算,如数据聚类,多标签分类,关系学习,查询应答,半监督学习,回归和嵌入学习。我们使用TensorFlow2的许多简单的解释例实施和说明上述每个任务。关键词:神经组音恐怖症,深度学习和推理,许多值逻辑。
translated by 谷歌翻译